当应用于自动驾驶汽车设置时,行动识别可以帮助丰富环境模型对世界的理解并改善未来行动的计划。为了改善自动驾驶汽车决策,我们在这项工作中提出了一种新型的两阶段在线行动识别系统,称为RADAC。RADAC提出了主动剂检测的问题,并在直接的两阶段管道中以进行动作检测和分类的直接识别人类活动识别中的参与者关系的想法。我们表明,我们提出的计划可以胜过ICCV2021 ROAD挑战数据集上的基线,并通过将其部署在真实的车辆平台上,我们演示了对环境中代理行动的高阶理解如何可以改善对真实自动驾驶汽车的决策。
translated by 谷歌翻译
历史上,轨迹计划和控制已分为自动驾驶堆栈中的两个模块。轨迹计划的重点是更高级别的任务,例如避免障碍物并保持在路面上,而控制器则尽最大努力遵循有史以来不断变化的参考轨迹。我们认为,由于计划中的轨迹与控制器可以执行的内容不匹配,因此这种分离是有缺陷的,并且(2)由于模型预测性控制(MPC)范式的灵活性而不必要。取而代之的是,在本文中,我们提出了一个基于统一的MPC轨迹计划和控制计划,该计划可确保在道路边界,静态和动态环境方面的可行性,并实施乘客舒适性限制。在各种方案中,对该方案进行了严格的评估,这些方案旨在证明最佳控制问题(OCP)设计和实时解决方案方法的有效性。原型代码将在https://github.com/watonomous/control上发布。
translated by 谷歌翻译
The Government of Kerala had increased the frequency of supply of free food kits owing to the pandemic, however, these items were static and not indicative of the personal preferences of the consumers. This paper conducts a comparative analysis of various clustering techniques on a scaled-down version of a real-world dataset obtained through a conjoint analysis-based survey. Clustering carried out by centroid-based methods such as k means is analyzed and the results are plotted along with SVD, and finally, a conclusion is reached as to which among the two is better. Once the clusters have been formulated, commodities are also decided upon for each cluster. Also, clustering is further enhanced by reassignment, based on a specific cluster loss threshold. Thus, the most efficacious clustering technique for designing a food kit tailored to the needs of individuals is finally obtained.
translated by 谷歌翻译
Generative AI has matured to a point where large-scale models can generate text that seems indistinguishable from human-written text and remarkably photorealistic images. Automatically measuring how close the distribution of generated data is to the target real data distribution is a key step in diagnosing existing models and developing better models. We present MAUVE, a family of comparison measures between pairs of distributions such as those encountered in the generative modeling of text or images. These scores are statistical summaries of divergence frontiers capturing two types of errors in generative modeling. We explore four approaches to statistically estimate these scores: vector quantization, non-parametric estimation, classifier-based estimation, and parametric Gaussian approximations. We provide statistical bounds for the vector quantization approach. Empirically, we find that the proposed scores paired with a range of $f$-divergences and statistical estimation methods can quantify the gaps between the distributions of human-written text and those of modern neural language models by correlating with human judgments and identifying known properties of the generated texts. We conclude the paper by demonstrating its applications to other AI domains and discussing practical recommendations.
translated by 谷歌翻译
Linguists distinguish between novel and conventional metaphor, a distinction which the metaphor detection task in NLP does not take into account. Instead, metaphoricity is formulated as a property of a token in a sentence, regardless of metaphor type. In this paper, we investigate the limitations of treating conventional metaphors in this way, and advocate for an alternative which we name 'metaphorical polysemy detection' (MPD). In MPD, only conventional metaphoricity is treated, and it is formulated as a property of word senses in a lexicon. We develop the first MPD model, which learns to identify conventional metaphors in the English WordNet. To train it, we present a novel training procedure that combines metaphor detection with word sense disambiguation (WSD). For evaluation, we manually annotate metaphor in two subsets of WordNet. Our model significantly outperforms a strong baseline based on a state-of-the-art metaphor detection model, attaining an ROC-AUC score of .78 (compared to .65) on one of the sets. Additionally, when paired with a WSD model, our approach outperforms a state-of-the-art metaphor detection model at identifying conventional metaphors in text (.659 F1 compared to .626).
translated by 谷歌翻译
A widely acknowledged shortcoming of WordNet is that it lacks a distinction between word meanings which are systematically related (polysemy), and those which are coincidental (homonymy). Several previous works have attempted to fill this gap, by inferring this information using computational methods. We revisit this task, and exploit recent advances in language modelling to synthesise homonymy annotation for Princeton WordNet. Previous approaches treat the problem using clustering methods; by contrast, our method works by linking WordNet to the Oxford English Dictionary, which contains the information we need. To perform this alignment, we pair definitions based on their proximity in an embedding space produced by a Transformer model. Despite the simplicity of this approach, our best model attains an F1 of .97 on an evaluation set that we annotate. The outcome of our work is a high-quality homonymy annotation layer for Princeton WordNet, which we release.
translated by 谷歌翻译
近年来,基于深度学习的技术已被引入轨迹优化领域。深度神经网络(DNN)被训练并用作常规优化过程的替代物。他们可以提供低推力(LT)转移成本估算,并实现更复杂的初步任务设计。但是,有效获取培训所需数量的轨迹数据是一个挑战。生成对抗网络(GAN)适用于有效生成可行的LT轨迹数据。 GAN由生成器和一个歧视器组成,它们都是深网。发电机使用随机噪声作为输入生成假LT传输功能,而鉴别器则将发电机的假LT传输功能与真实LT传输功能区分开。对GAN进行训练,直到发电机生成鉴别器无法识别的假LT转移。这表明发电机生成低推力传输特征,其分布与真实传输特征相同。生成的低推力传输数据具有很高的收敛速率,并且可以用于有效地为深度学习模型生成训练数据。通过在接近地球(NEA)任务方案中产生可行的LT转移来验证所提出的方法。 GAN生成样品的收敛速率为84.3%。
translated by 谷歌翻译
我们挑战AI模型,以“展示”对《纽约客》标题比赛的复杂多模式幽默的理解。具体而言,我们开发了三个精心限制的任务,以掌握图像和标题之间的潜在复杂和意外的关系,并且对人类经验的广泛品种产生了复杂和意外的寓意;这些是纽约口径卡通的标志。我们调查了直接将卡通像素和字幕输入的视觉和语言模型,以及仅通过提供图像的文本描述来规避图像处理的仅限语言模型。即使我们为卡通图像提供了丰富的多方面注释,我们也可以确定高质量的机器学习模型(例如,微调,175b参数语言模型)和人类之间的性能差距。我们公开发布我们的语料库,包括描述图像的位置/实体的注释,场景的不寻常以及对笑话的解释。
translated by 谷歌翻译
对现实世界的高质量观​​察对于各种应用至关重要,包括生产小型场景的3D印刷复制品以及对大型基础设施进行检查。这些3D观察通常是通过从不同观点组合多个传感器测量结果来获得的。指导选择合适的视图被称为下一个最佳视图(NBV)计划问题。大多数NBV都使用刚性数据结构(例如表面网格或体素电网)进行测量的原因。这简化了下一个最佳视图选择,但可以在计算上昂贵,减少现实世界的保真度,并与最终数据处理一起选择下一个最佳视图。本文介绍了表面边缘资源管理器(请参阅),这是一种NBV方法,该方法直接从先前的传感器测量中选择了新的观测值,而无需刚性数据结构。请参阅使用测量密度,以提出下一个最佳视图,以增加观察到的表面不足的覆盖范围,同时避免潜在的遮挡。模拟实验的统计结果表明,与在小型和大型场景上评估的体积方法相比,SEE可以在更少的计算时间和传感器行进距离中获得更好的表面覆盖范围。现实世界实验证明了使用固定在机器人臂上的3D传感器自主观察鹿雕像。
translated by 谷歌翻译
我们提出了Unified-io,该模型执行了跨越经典计算机视觉任务的各种AI任务,包括姿势估计,对象检测,深度估计和图像生成,视觉和语言任务,例如区域字幕和引用表达理解,并引用表达理解,进行自然语言处理任务,例如回答和释义。由于与每个任务有关的异质输入和输出,包括RGB图像,每个像素映射,二进制掩码,边界框和语言,开发一个统一模型引起了独特的挑战。我们通过将每个受支持的输入和输出均匀地均匀地统一到一系列离散的词汇令牌来实现这一统一。在所有任务中,这种共同的表示使我们能够在视觉和语言字段中的80多个不同数据集上培训单个基于变压器的体系结构。 Unified-io是第一个能够在砂砾基准上执行所有7个任务的模型,并在NYUV2-DEPTH,Imagenet,VQA2.0,OK-VQA,SWIG,SWIG,VIZWIZ,BOOLQ,BOOLQ和SCITAIL,带有NYUV2-DEPTH,Imagenet,VQA2.0,诸如NYUV2-DEPTH,ImageNet,vqa2.0等16个不同的基准中产生强大的结果。没有任务或基准特定的微调。 unified-io的演示可在https://unified-io.allenai.org上获得。
translated by 谷歌翻译